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On very large scales, density fluctuations in the Universe are small, suggesting a perturbative 
model for large-scale clustering of galaxies (or other dark matter tracers), in which the galaxy 
density is written as a Taylor series in the local mass density, 8, with the unknown coefficients in 
the series treated as free "bias" parameters. We extend this model to include dependence of the 
galaxy density on the local values of ViX7j<j> and VjV,-, where <f> is the potential and v is the peculiar 
velocity. We show that only two new free parameters are needed to model the power spectrum 
and bispectrum up to 4th order in the initial density perturbations, once symmetry considerations 
and equivalences between possible terms are accounted for. One of the new parameters is a bias 
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multiplying SijSji, where = [ViVjV -2 — \b~jj\ 5. The other multiplies Sijtji, where Uj = 
\Q 1 [ViVj V -2 - (9 - S), with 9 = — (a H dlnD/dlna)' 1 V-v. (There are other, observationally 

equivalent, ways to write the two terms, e.g., using 9 — 8 instead of SijSji.) We show how short- 
range (non-gravitational) non- locality can be included through a controlled series of higher derivative 
terms, starting with R 2 V 2 5, where 7? is the scale of non-locality (this term will be a small correction 
\ as long as k 2 R 2 is small, where k is the observed wavenumber). We suggest that there will be 

much more information in future huge redshift surveys in the range of scales where beyond-linear 
'-"^ ' perturbation theory is both necessary and sufficient than in the fully linear regime. 

6 

I. INTRODUCTION 

While measurements of galaxy clustering have been around for a long time [l[ , to the point where the casual observer 
might think they must surely be almost finished, or at least well-underway, in fact we have barely scratched the surface 
t-H [ of the possibilities for measuring large-scale structure (hereafter, LSS, defined in this paper to mean surveys of any 
ON . tracer of the large-scale mass density field - we will often call the tracer "galaxies" , but it could just as well be quasars 
@, [H, the Lya forest 0, H, El, galaxy cluster/Sunyaev-Zel'dovich effect measurements Q, 21cm surveys [1, etc.). 
Measuring LSS should really be regarded as an exciting future probe of cosmology, with growth potential not a priori 
less than probes with less past success. The reason is simply that we have so far probed only a tiny fraction of the 
observable volume of the Universe. For example, the largest galaxy redshift survey with density approaching what 
is needed to fully sample the near-linear regime of clustering, the Sloan Digital Sky Survey (SDSS) Luminous Red 
Galaxy (LRG) survey [ijj, probes < 2 cubic Gpc/h, or - 0.3% of the comoving volume at z < 5. Figure [1] shows 
that the fraction of linear regime modes, i.e., easily usable information, probed by the LRGs is even smaller - barely 
0.01% of the modes at z < 5 - because the non-linear scale is smaller at higher z. (For this figure, we have used 
= 0.1/ [D (z) / D (0)] feMpc -1 for the non-linear scale, where D is the linear growth factor. The normalization 
0.1 /iMpc -1 is somewhat arbitrary, depending on one's definition of the non-linear scale, but changing it only changes 
the overall normalization of the figure. The redshift dependence is motivated by (TTL . ) 

The high precision of LSS statistics measured using future surveys probing appreciable fractions of the observable 
Universe g GJ, Q M M EE M 

I20I l2l| will require an unprecedented level of accuracy in our theoreti- 
cal/phenomenological calculations of predictions for the statistics, if we are to fully exploit the potential of these 
surveys for measuring fundamental physics/cosmology. On very large scales we can use linear theory, but the scale 
below which linear theory cannot be trusted at the level of the error bars will become larger and larger (corresponding 
to a smaller and smaller maximum reliable wavenumber k) as the error bars shrink. The number of Fourier modes in a 
three-dimensional survey goes like the cube of the maximum usable k, i.e., in terms of raw information, extending the 
usable range of k by a factor of 2 is equivalent to extending the volume of the survey by a factor of 8 (for a Gaussian 
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FIG. 1: Cumulative number of modes with k < 0.1/ [D (z) /D (0)] ftMpc 1 up to a given redshift. The largest reasonably 
well-sampled LSS survey, the SDSS LRGs, probe only an tiny fraction of the available modes. 



field). As we will see (Fig. HJ, the range of scales where corrections to linear theory are small (perturbative), but 
still statistically significant, can easily be a factor of ~ 4 for future large surveys. The point is simply that we have 
enormous leverage to extend the value of surveys through modeling improvements that extend the usable range of k. 
For example, if a survey costs 50 million dollars, extending the effectively usable k range by a mere factor of 1.3 (say, 
from 0.1 h Mpc -1 to 0.13 h Mpc -1 ) would be worth roughly 1000 person-years (at $60000 per year). Phenomenological 
theory associated with LSS surveys should be viewed not as a typical academic exercise, pursued by a few individuals 
or small groups because they think it is "interesting" , but instead as an industrial, infrastructure building endeavor, 
critical to surveys in much the same way as, say, the road up to the telescope. 

Better modeling is needed even for present, moderate precision surveys. For example, [22j shows clearly where the 
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linear bias model [23| that we have been relying on for cosmological parameter estimation for decades is breaking 
down, by comparing results from SDSS and 2dF galaxies (see also 0, [HI). The power spectra of two different types 
of galaxies are not related by a simple overall normalization factor (bias) - their ratio depends on scale, even on quite 
large scales where it was once hoped that linear theory would be g ood enough. This was not completely unanticipated, 
however, [2^] also shows that the ad hoc fitting formula of [26j |. that has been used recently to try to account for 
quasi-linear galaxy clustering, does not work well, and these problems lead to disagreement between cosmological 
parameters inferred from different galaxy surveys (see also 27]). Clearly, we have a lot of theoretical work to do if we 
want to fully exploit future, much more precise, LSS data. 

For measurements of the baryonic acoustic oscillation (BAO) feature [IH, [H, [29|, H(| HH, [H, HH, HI, HI, HI] , ad hoc 
fitting formulas very carefully calibrated by simulations may be sufficient, but measuring other physics that produces 
less distinctive signatures in the power spectrum, e.g., redshift-space distortions aimed at constraining dark energy 
[13, HI, [H, H3] , or measurements of the shape of the power spectrum aimed at constraining modified gravity |4lll42f. 
neutrino masses [H, [H, [H, [H, H3, El, [H, HO, HH, [Hi] , inflation [H, etc. [11, HI HI , will require well-motivated, 
rigorous descriptions of the relation between galaxy and mass density, i.e., bias models. In other words, better LSS 
theory will substantially enhance the constraining power of BAO-oriented surveys, by allowing the use of non-BAO 
information [561 ]. 

Bias modeling can be roughly divided into two approaches (excluding attempts to simulate galaxies from something 
resembling first principles [571 158| . which can be useful as a guide/spot-check for other methods, but are unlikely to be 
accurate and efficient enough to use for interpretation of precision statistics any time soon): The first approach might 
be called a bottom-up approach, where one starts with a model for how individual galaxies sit in the local small-scale 
mass density field (most recently almost always based on galaxies sitting in dark matter halos, but earlier on peaks 
or other features), and then computes large-scale clustering by including the large-scale correlation of the relevant 
small-scale density feature. The other approach might be called top-down, or perturbative, where one starts from the 
fact that large-scale fluctuations are small and expands a completely unknown relation between galaxies and mass, 
with generally infinite freedom (except typically for the assumption of locality, relative to the scale of observations) 
into a Taylor series in the density perturbations, where the coefficients of the first few terms in the series become the 
free parameters of the model (the main point of the renormalizcd bias scheme of [59( was to demonstrate how this 
separation of scales can be done in an organized way — see [60T | for a general review of LSS perturbation theory). 

This paper takes the perturbative approach, but most recent work has been based in some way on dark matter halos 
(e.g., [6l|, [62|, HE [6J, HE Hg, [67], HE [63, [70, [7l|, Iz2, @ Q)- A strong foundation for halo models is the expectation 
that, with enough work, it should be possible to make accurate numerical simulations of the large-scale clustering of 
halos within a given cosmological model [75| (it is much more difficult to fully quantify this clustering to the point 
where one does not need to make halo models based on the halos in full simulations, but that is only necessary for 
convenience). Unfortunately, we can see these halos only through the coarse probe of gravitational lensing 76], and 
it is not straightforward to determine the relation between halos and the more easily observable galaxies. The halo 
models therefore specify a "halo occupation distribution" (HOD) for the galaxies, i.e., a recipe for populating halos 
with galaxies. The hope of these models is that they can determine the HOD using information deeper into the 
non-linear regime than possible using the more general, less predictive, perturbative approach that we will discuss, 
but this is a difficult game. To be reliable, models that populate halos within a full numerical simulation must include 
enough freedom in the method for populating halos to cover all realistic possibilities. Models that further rely on 
analytic calculations for the clustering of halos introduce another level of complexity and possibility of error [771 178| . 

To appreciate the small-scale complexity that we will bundle into a few perturbative bias parameters, it is useful to 
review the recent work toward understanding the details of halo models. The standard HOD assumption is that the 
number of galaxies in a halo is some relatively simple function of the mass of the halo. Even these relatively simple 
HODs have ~ 10 free parameters (63[. There is observational evidence that this form of HOD works qualitatively very 
well [79j; however, the assumptions involved clearly can not be perfect. |80j showed that the clustering of halos of a 
fixed mass depends significantly on the time when the halo formed (see also [U HI, HH ) . This phenomenon is often 
called assembly bias. When combined with the possibility that the galaxy population within halos of a given mass 
can depend on the halo formation time, this means that it is necessary for the HOD to depend on more parameters 
than just mass. [H[ demonstrated this explicitly using semi-analytic models for galaxy formation (see also [85]), and 
found that accounting for formation time or halo concentration in addition to mass explains only a fraction of the 
effect. [86j found that the magnitude and mass-dependence of the assembly bias depends on the definition of halo 
formation time (different definitions capture different aspects of the history of the halo). [13] extends these results 
to higher order statistics. (88| showed that the clustering of massive halos depends on concentration in addition to 
mass, and also recent history of mergers. The simulations of [89, 90] suggest that the relation between formation time 
and clustering for small halos is due to the effect of tides in high density regions suppressing later growth of small 
halos. The simulations and analytic calculations of [9l[ suggest that at low masses assembly bias is again related to 
high density regions suppressing late-time accretion, and at high masses the effect is related to the curvature around 
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the initial peak that grows into the halo. The simulations of 92] show that the clustering of halos at high redshift 
also depends significantly on their angular momentum, at fixed mass. Finally, simulations even show a population 
of halos that were once subhalos within a larger halo, but were ejected by interactions [93]. Not surprisingly, the 
ejected halos do not cluster in the same way as other halos of the same mass. Generally, the idea that the mass 
density field breaks up neatly into halos, containing galaxies, which retain little information about their formation 
process, is a great qualitative way to picture the formation of structure, but we should not forget that it is a picture, 
not a calculation. Another assumption of typical halo models is that the distribution of satellite galaxies within dark 
matter halos follows the mass density profile, but this has been only roughly justified [9J, [95|, l96|, [97f . Explanations 
of why these issues are not fundamental problems for the HOD approach make the argument that the effects are not 
large enough to matter now, but not that they will not in the future [63| . 

In the face of any uncertainty about whether the small-scale halo model is sufficient, a precision measurement 
of fundamental physics/cosmology that is consistent with prior expectations may be believed, but a truly new, 
unexpected result will not be. This is only a very meager form of progress. The same kind of thing can be said about 
the perturbative approach - as long as there is any question of whether the bias description is complete, the results 
will not be believed in any important situation. We believe that it is reasonable to hope that the perturbative bias 
approach can be made relatively airtight, as long as one does not try to push it beyond its range of validity. This 
paper is an attempt to make progress in that direction. 

General understanding of large-scale clustering, independent of specific small-scale models for the dark matter tracer, 
has been developing gradually. [98| showed that if the galaxy density is a general function of the local mass density, 
and the mass density field is assumed to be Gaussian, the asymptotically large-scale galaxy correlation function will 
be proportional to the mass correlation function (except for special cases of the local function). [98| also showed that, 
under the same conditions, the galaxy power spectrum may go to a constant as k — > (even if no white noise is 
introduced by hand). [991 ] introduced the perturbative bias model in the form that we will follow, where the galaxy 
density perturbation S g is first written as a completely general function, f(S), of the mass density perturbation 5, and 
then the function is Taylor expanded, with the unknown coefficients in the series becoming the bias parameters, bi, 
i.e., 



with the mass density given by gravitational perturbation theory. Note that the observation that the first order term 
in this series describes simple scale-inde pend ent linear bias does not guarantee that higher order terms cannot cause 
large-scale deviations from this form. [lOOj ] showed, starting with the same Taylor series form of bias, that if the 
mass clustering is hierarchical, then £ 5 (r) cx £(r) + 0(£ 2 ), even if the local bias relation is applied on scales where 
the fluctuations are not small. The large-scale bias factor found by jlOdl ] was an infinite sum of terms proportional to 
powers of the mass density variance, a foreshadowing of the renormalized bias approach we follow in this paper [59| . 
They went on to show that the linear bias relation holds even if the local mass density does not determine the galaxy 
density uniquely, but only determines a random distri buti on for the galaxy density (with the randomness in that 
distribution independent from point to point). Finally, [lOOj ] showed that the galaxy power spectrum obeys the linear 
bias relation on scales similar to the correlation function, except the small-separation part of the co rrela t ion funct ion, 
which deviates from linear bias, will contribute an added constant t o the power spectrum (see also [lOll llOSj Il03j ). a 
foreshadowing of the noise renormalization that we will employ [1^] . |l04j found similarly that higher order corrections 
in straightforward gravitational perturbation theory starting from the local Taylor series model for bias produce 
terms that on large scales look like modifications of the linear theory bias or additional shot-noise. Generally, it 
has been pretty we ll established that linear bias plus white noise is the correct model for very large scale galaxy 
clustering [l05l Il06l . 1 1071 ] , barring the introduction of long-range non-gravitational effects which essentially introduce 
deviations from this form by hand. [59j put these resu lts together into a neat computational package, by employing 
renormali zation ide as from quantum field theory [l08j ] (some similar ideas were present in [l09j ]). The inconvenient 
results of jlOdl . Il04j , that higher order calculations can affect clustering statistics on arbitrarily large scales, and that 
these corrections are sensitive to the assumed small scale smoothing (cutoff), are rendered observationally irrelevant 
by absorbing the inconvenient pieces into renormalizations of the existing bias parameters (including the noise level) . 
This approach clears the way for pushing, in a systematic way, beyond the very large-scale, purely linear, regime 
and into the information-rich smaller scales where higher order corrections are non-negligible, and understanding the 
smoothing/cutoff issue becomes critical. [56j showed that this approach describes clustering in simulations very well. 

Remarkably, for all of the work on both the halo-based and perturbative approaches to bias, neither have generally 
been adopted, beyond the papers in which they are pro p ose d, for use in the main stream of LSS power spectrum 
measurement and cosmological parameter estimation [25|, IllCt llllj . In fact, even the proposers generally have 
not pushed their methods through to the point of making comprehensive parameter measurements (see [ll2j j for an 




oo 



(1) 



5 



exception). The widespread use of the demonstrably inadequate (when extrapolated beyond its original purpose) 
fitting formula of (26j should really be seen as an embarrassing failure of the LSS theory community. This paper will, 
unfortunately, continue this legacy of failure, but with the hope that it can soon be rectified. 

In this paper, we will improve the Eulerian bias model by allowing for dependence on the local velocity divergence 
and shear and the tidal tensor in addition to density. The reason to expect such dependence at some level is simple: 
two patches of space with the same final density did not necessarily follow the same path to reach that density, and 
that difference in history may affect the galaxy density at the time of observation. In perturbation theory up to some 
finite order, however, the entire density history of a patch is reconstructible given a finite number of local quantities 
like the the velocity divergence and tidal tensor. This raises the hope that a completely unique, general, bias model 
can be constructed, covering all possibilities for large-scale clustering with a finite set of bias parameters. (One can 
always imagine una voidable o bstacles to this, e.g., long-range non-gravitational effects like inhomogeneous reionization 
affecting clustering [ll3l . lll4| - however, to the extent that something like this is important on a given scale, very high 
precision cosmology is probably simply impossible on that scale.) While the primary philosophy of this paper is that 
any possible form of large-scale clustering should be included in the model, unless it can be compellingly rejected, 
there is actually a lot of evidence that these new forms of bias are needed, related to the assembly bias phe nomenon 
seen in simulations [89j or obse rvati o nal c orrelations between galaxy properties and their environment [l!5| |. 




In a very interesting paper, [llq . 11171 ] points out that a perturbative bias model assumed to be local in initial 
Lagrangia n density prod uces results distinct from the model assumed to be local in final Eulerian density (see also 
[l 18L Ill9j |). While [l!6j | presents this as an advantage of Lagrangian PT, which is supposed to be a more correct 
way to look at bias, we believe that it is better to say that this represents a deficiency in the development of one or 
both approaches, not a conceptual problem with either. As a first approximation, it may be more accurate to assume 
that bias is local in the initial Lagrangian density than the final Eulerian density, but neither assumption can be 
rigorously justified. Barring the unlikely proof that one approach is fundamentally superior to the other, one criteria 
for believing future very high precision cosmology measurements should be that Lagrangian and Eulerian PT give 
equivalent answers in regimes where the calculations converge, once all possible freedom is included in each version 
of the bias model. We prefer to work with the Eulerian model simply because it is expressed in terms of quantities 
that are generally more dir ectly obs ervable. This paper will implicitly address the differences between Lagrangian 



Note that, while we primarily discuss results in terms of the power spectrum, nothing about the perturbative 
approach intrinsically requires one to go to Fourier space. It is simple to obtain the correlation function by Fourier 
transforming the power spectrum, but it is also possible to do all of the same calculations, from scratch, in configuration 
space. 

The plan of the rest of the paper is as follows: In Sjll] we discuss the primary new extensions to the Eulerian bias 
model that we will work out fully in this paper: including dependence on the local large-scale tidal tensor and velocity 
divergence and shear. In ^Illl we briefly discuss some further extensions that are implied by the same line of thinking, 
related to redshift-spacc distortions, short-range non-locality, and non-Gaussianity of the primordial perturbations, 
although we will not fully develop them. Finally, in ijlVI we will give some conclusions and thoughts on directions for 
future work. 



In this section we lay out a baseline extension to the model of galaxy bias as dependent on local density only. In 
mi Al we discuss the variables we will allow the galaxy density to depend on, and in W Bl we compute statistics of 
galaxy clustering using these variables. 



This subsection seeks to answer the question: In general, in principle, in perturbation theory, what can the galaxy 
density depend on? 

Everything we know about LSS at a given time in standard perturbation theory (PT) is contained in the dynamical 
variables S (x) = p m (x) jp m — 1, where p m (x) is the mass density at position x and p m is the mean mass density, 
and 9 (x) = V • v (x), where v is the peculiar velocity (see 60] for a review of LSS PT - note that we will make the 
usual approximation that the Einstein-de Sitter PT results can be used for other models as long as the linear growth 
factor is replaced by the growth factor in the desired model). Because the velocity field is curl- free, it can be derived 
from 6, i.e., Uj = Vj V~ 2 9 (V 2 = ViV», and V~ 2 represents the usual r _1 potential integral, or — fc~ 2 in Fourier 
space). To allow for the non- locality (in the density field) introduced by gravity, we will also consider dependence of 





II. A MORE GENERAL EULERIAN BIAS MODEL 



A. Independent variables 
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the galaxy density on the local potential field, </>(x), which can always be derived from 5 using the Poisson equation. 
Allowing dependence on v(x) and </>(x), in spite of the fact that the system is entirely determined by <5(x) and 0(x), 
can be understood as allowing for history dependence of the number of galaxies in a given patch of space, i.e., these 
quantities tell us about the path the patch took to get to the density and velocity divergence that it has. 

A homogeneous change in <f> should not be observable, which suggests that the galaxy density should only depend 
on Vi</>. Furthermore, a homogeneous gravitational force shouldn't be observable either, suggesting that we should 
use ViV,-<£. Therefore we define: 



a «(x) = ViV^(x)-l<5£ <5(x) 



3 « 



5 (x) = 7y<5(x) 



(2) 



where we have removed the trace of ViVj<fi because it is redundant with 5 (note that we are absorbing all of the 
spatially constant factors in the Poisson equation into the definition of <p, i.e., V 2 (/> = 5 - we will make a similar 
re-definition of Vi to make 9 = 5 in linear theory). For compactness, we have defined the operator 



7ij = ViVj V" 
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(3) 



Similarly, a homogeneous velocity field should not be observable, suggesting that galaxy density depends on velocity 
through ViUj = V^Vj V~ 2 0. Because = 5 at linear order, V»«j is redundant with ViVj(j) at linear order, so it 
simplifies things in perturbation theory to use their difference for our independent variables, i.e., to define 



,,(x) = 0(x)-<J(x) 



(4) 



and 



tij (x) = W iVj (x) - -5*0 (x) - Sij (x) - V,V, V 



3 « 



[(9(x)-*(x)]= 74j »,(x) 



(5) 



The difference variables 77 and are non-zero only at 2nd order. 

Now, the galaxy density will depend on 5, Sij, 77, and Uj, but it can't depend directly on anything but a scalar 
quantity. This is because, assuming homogeneity and isotropy, we can only have constant, scalar, bias parameters. 
For example, the general Taylor series for a function that depends on a small tensor Uij is 



/K) = /(o) + 



_d£_ 

dan 



(OK 



Pa +Pij<?ij 



(6) 



In general, each element of pij could be independent, but this is inconsistent with isotropy. The only consistent 
possibility is p^ = PiS^ . In this case, only an enters the Taylor series. Similar arguments apply to higher order 
terms. 

By construction, su — and ta = 0. We can construct products, up to 3rd order in the initial perturbations, 

51 



s-ijtji] and s 



SijSjkSki (Ujtji is 4th order). It turns out that, at 2nd order in PT, rj2 
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This suggests that, in place of 77, we use a variable constructed to be zero at both 1st and 2nd order in standard PT, 



^(x)^r ? (x)-^ 2 (x) + ±<5 2 (x) . 



(7) 



This definition makes ip non-zero only at 3rd order. Note that we can not redefine tij in terms of tp because this would 
require terms like jijS 2 . To summarize, our galaxy density will (naively) be a Taylor series involving the following 
eight quantities: 



1st order 
2nd order 
3rd order 



S 

5\ s 2 

5 3 , 5s 2 , tp, st, s 3 



(8) 



This shows why standard linear theory bias, 5 g = b 5, is sufficient in the truly linear regime: all other independent 
scalar quantities we can form are higher order. 

Finally, our model, which now starts with p g = f(5, X7iVj<j), VjUj), will be extended to include general dependence 
on a mean-zero Gaussian white noise variable e, i.e., p g = f{5, ViVj0, VjUj, e), to allow for stochasticity and shot- 
noise in the galaxy density-mass density relation. This approach is new relative to past work where a noise variable 
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was simply tacked onto the end of the Taylor series. We will Taylor expand around e = 0, just like the other variables, 
treating epsilon as similar in size to S, and including all higher order terms. This may appear strange, and actually 
will not affect power spectrum calculations at all, but we will see when we compute the bispectrum that this is a 
compact way to inclu de th e fact that Poisson sampling of the density field actually affects the bispectrum, in contrast 
to Gaussian noise [ll, Il20j | . 

A Taylor series in these quantities, up to 3rd order in the initial perturbations, is 

Pg = PO+PS 5 + \ PS 2 52 + \ps 2 S 2 + PS 3 $ 3 + \p$s 2 $ S 2 + p,p 1p + p st St + ^ p s 3 S 3 (9) 

+p e e + ps € Se + i p S 2 t 5 2 e + ^p s 2 e s 2 e + ^p e 2 e 2 + ^ps e 2 Se 2 + ^p £ 3 e 3 + ... 

(note that the factors of 1/2 and 1/3! serve no real purpose, because the p's are essentially arbitrary and could be 
redefined to include these factors). 

One might ask at this point: Why not add more derivatives, e.g., terms like V 2 <5 or products of Jijjkl <5? Also, why 
not make the dependence non-local, i.e., 

p 9 (x) = /[«5(x')] , (10) 

where x' can be any position, not just the position x where we are measuring the density. It turns out that these 
things are related, as we will discuss further in ^III Al As long as the non-locality is short range, it can be easily 
represented by a controlled series of higher derivative terms like V 2 5. Terms like jijjkiS, which we will not consider, 
introduce new long-range V -2 operators, beyond the one already present in the construction of the gravitational 
potential. 

One might also wonder about the eigenvalues of Sij, Xi jl2l| : Are they not additional scalar quantities that are 
linear in the perturbation amplitude, and thus loopholes in the argument that linear order bias can only depend 
on 67 In three dimensions, they are hard to write down explicitly, but the two dimensional version is informative: 

\± = ±J j (sn — S22) 2 + s\ 2 . We see that these quantities are in some sense the same order as 5, but they are not 
well behaved analytic functions of . This is illustrated by considering a similar, but simpler to understand, possible 
term, \5\ = v 7 ^ 2 . At S = 0, \S\ is not differentiable, and it becomes especially obvious how unphysical this must be 
when we observe that local physics has no particular reason to see the mean density of the Universe as a special 
value. Similarly, it seems unlikely that it is physically correct for the dependence of galaxy density on su — S22 (for 
S12 = 0) to make a sharp change of direction at su — S22 — (which is just the transition from a tensor extended 
in the 1 direction to the 2 direction), as it would if we included terms linear in the eigenvalues. It is undoubtedly 
possible for the galaxy density to depend on these eigenvalues - the argument here is simply that this dependence 
should be higher than linear order. Our parameterization actually already includes this dependence very directly: 
s 2 = sijSji = XiXi, i.e., s 2 is the sum of squares of the eigenvalues. 

The bottom line is: We stick to the terms that are obtained in a Taylor series in 8, diVj, and didj(j), with only short 
range (relative to the scale of observations) non-locality in the dependence of galaxy density on these quantities. We 
leave for the future the question of how completely general this approach is. 

B. Statistics 

The mean galaxy density is, to 3rd order in the initial perturbations, 

Pa = (Pa) =Po + \p& 2 ° 2 + T^Ps 2 cr 2 + \^p t 2 cr 2 , (11) 
where o 2 — (S 2 ), (s 2 ) = |c 2 , and a 2 — (e 2 ). Redefining all the coefficients after division by p g gives 

& 9 = Pal Pa - 1 (12) 
= c s S + i c S 2 (5 2 - a 2 ) + ^c s 2 (s 2 - ~fJ 2 ^ + i c S s S 3 + ^c 5s 2 S s 2 + ip + c st st + ^ c s a s 3 

+c c e + c Se Se + ^ c S 2 t 6 2 e + ic s 2 e s 2 e + ic e 2 (e 2 - a 2 ) + ic 5e 2 5e 2 + ^c e 3 e 3 + ... 
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1. Galaxy-mass cross-spectrum 



For simplicity, we start by calculating the mass density-galaxy density cross-spectrum, i.e., (5 m (k) S g (k')) = 



(2tt) 3 5 d (k + k') P mg (k), which is 



Ping (fe) 



cs PNh(k) 
d 3 q 



(13) 



+ C S 2 



(27T) 

d 3 q 



2 c. 



1 



P(fc) 



q|) pf (q,k-q) + ^ C52 CT 2 P (k) 



q|)pf (q,k-q)5(q,k-q) 



d 3 ( 



(27T 

1 



^P( 9 )pf(-q,k)S(q,k-q) 



- c 53 a 2 P (k) + - c Ss 2 a 2 P (k) 



2 

2 c.0 



P(fc) 



2c st 



3 
rf 3 q 

d 3 q 



2 s 



(3) 



(q, 



2Pf (-q,k)^ 2) (q,k q) 



(27T)' 



P(<?)£>f (-q,k)5(q,k-q) 



See the Appendix for definitions of P5, S*, and D$. Pnl(^) is the non-linear mass power spectrum. P(fc) with no 
subscript always refers to the linear theory mass power. Note that the s 3 term works out to exactly zero, so the 
parameter c s 3 has been rendered irrelevant. 

As we found in [59 ] . some terms like ^cssa 2 P (k) appear which are best treated as renormalizations of the linear 
theory bias, i.e., by a redefinition like c' s = cs + ^cs^a 2 . As discussed in [59], the un-smoothed density variance 
(j 2 = (5 2 ^ may not be literally infinite, depending on the power spectrum, but it will be large, and sensitive to the 
deeply non-linear regime where all of our calculations are meaningless. It is best to think of the original cs as an 
un-observable "bare" parameter, with the observable linear bias factor being largely un-related to it as the sum of 
many higher order terms which are generally much larger. This idea that the values of the parameters of large-scale 
galaxy clustering are generated by small-scale, higher order effects is physically reasonable, or even expected — after 
all, if there were truly only small, linearizable, perturbations in the Universe, there would be no galaxies. 

The term associated with st has an interesting new feature. In the k — > limit, we find 



2 c st P(k) 



d 3 q 

(2^) 3 



P{q)Df (-q,k)S(q,k-q) 



fe^O 



16 

™ Cst cr* 
63 



P(k) 



(14) 



Like the <5 3 term, for example, this looks like a renormalization of the linear bias; however, unlike the 5 3 term, here 
there is non-trivial k dependence as one goes to non-zero k. This case provides an opportunity to demonstrate how 
the renormalization works more clearly. Defining r = q/k, /1 = k ■ q/fc q, and 



J(r) 



105 
~32~ 



y ^D«(-q,k) S(q,k-q) 



we have 



/ ~&f P(q) ^ (2)( ~ q ' k) S(q,k-q) = d\nrA 2 (kr) /( 



(15) 



(16) 



where A 2 (q) = q 3 P (q) /2tt 2 . I (r) gives the weight function over which one must integrate A 2 (q) to obtain the bias 
term. Figure [2] shows a plot of I(r). We see that I (r) is constant as r — > 00. This leads to the constant result as 
k — > 0, and is clearly undesirable as it represents sensitivity to arbitrarily small, highly non-linear scales. The solution 
is to subtract the k — > result, i.e., — g| c s t a 2 P(k), from this term, and add it to the linear theory bias. The 
remainder is 



2 c st P(k) 



rf 3 q 

(2tt) 3 



P(Q) 



(_q,k)S(q,k-q) + ^ 



32 f 

= est P(k) —JdlnrA 2 (kr) I R (r) 



(17) 
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0.1 1 10 100 

r=q/k 

FIG. 2: Weighting kernel over which A 2 (q = r k) is integrated to obtain the contribution of several terms to P mg (k). The 
dotted line shows I (r), defined by Eq. (115(1 . which is sensitive to high-fc power. The solid line shows the kernel after renormal- 
ization of the linear bias, Ir (r) = I (r) + 5/6, which now acts as a filter to produce the variance of the density field smoothed 
on scale k. 



where 

/fl(r)=7(r) + 5/6, (18) 

now looks like a smoothing kernel, with no sensitivity to power for r >> 1, i.e., q >> k (the factor 105/32 was 
chosen to make Jr (r — > 0) — * 1, i.e., to look like the Fourier transform of a mass conserving smoothing kernel). The 
change in bias due to this term at observed scale k is quite simply proportional to the variance on scale k, as defined 
by the weighting function In (r) . 
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A similar procedure must be followed with the second s 2 term, i.e., 

° s2 P {k) I (£^ P {q) ^ (_q ' k) S (q ' k " q) ^ 1 ° s2 ^ m 



(19) 



All of the other terms go to zero for small fc. As in [59j, we now define the observable, renormalized, linear bias as 
the sum of bias-like terms 



.'34 68 1 1 16 \ 2 1 2 

bg = eg + [ —c S 2 + —c s 2 + -c S s + -c Ss 2 - —c st I a + -c 5e 2 cr e 



(20) 



Note that this is the only appearance of the parameters C53, cg s 2, and cg e 2, so they are no longer needed. In fact, the 
random noise variable e has completely disappeared, just like it would have if it was only included as a single term at 
the end of the Taylor series. 

The P mg (fc) result simplifies even more when we find, somewhat surprisingly, that the three terms proportional 
to P (fc) in Eq. (|13j) are exactly proportional to each other, after renormalization and angle-integration. This means 
that we can define one merged term that accounts for all of them, i.e., 



c 3 4{k) P(k) = 2 c s 2 P{k) 



+ 2 P (fc) 



2 c st P (fc) 



32 



(2^) 3 
rf 3 q 

(27T) 

d 3 q 



FP (-q,k)5(q,k-q)-^ 



34 



(27T) 



^ (q, -q, -k) - 2 Ff (-q, k) Df (q, k q) 



Df (-q,k)5(q,k-q) + - 



(21) 



105 



16 



-^{Cst- ^C S 2 + — c 4 , ) cr 3 (k) P (k) , 



where 



cr 2 (fc) = / d\nr A 2 (kr) I R {r) 



(22) 



Note that the inclusion of the s 2 term in this redefinition is convenient but not at all necessary, because it is perfectly 
well-behaved, and the redefinition does not remove all appearances of the parameter c s 2 . The reason to include this 
term in the redefinition is that, presumably, a fit to data using c s 2 and C3 will show less degeneracy between the two 
parameters if the functions they multiply do not have substantial components which have identical form. 

Finally, we define normalized parameters b$2 = c$2 /bg, b s 2 = c s 2 /bg, and 63 = C3/65 to produce the power spectrum 



P mg (k) = bg P NL (fc) + 63 a\ (k) P (k) + 



d 3 c 



r P(g)P(|k-q|)4 2) (q,k-q) 



3 5 2 



■6 s2 5(q,k-q) 



(23) 



The final expression has two new terms relative to the version from the <5-only Taylor series in [59l |. The term 
associated with cr| is more like a true fc-dependent bias, in the sense that the power at a given k is still proportional 
to the matter power spectrum at that k, just multiplied by a fc-dependent factor; while the other term, associated 
with b s 2, mixes power from a range of scales. These terms come from the correlation of the linear and second order 
parts, respectively, of the mass density field with the galaxy field. Figure [3] shows the effect of all the terms, for a 
typical ACDM model, at z = 1. We see that the b s 2 term is actually quite small relative to the others, for similar 
values of the bias parameters. In this paper the parameter values are completely arbitrary, simply chosen to make 
the different effects comparable in size in the more easily observable galaxy-galaxy power spectrum, P gg , where the 
effect of the b s 2 term is substantially larger (Fig. 2]). The 63 term, on the other hand, can have a larger effect on P mg , 
relative to its effect on P gg . 

Note that we could have, completely equivalently, left 77 as our independent variable while redefining s 2 to make it 
non-zero only at 3rd order. All differences in the resulting equations would be numerical factors which can be removed 
by redefining the parameters. The least trivial looking of these changes would be changing b s 2 S (q, k — q) term in 

Eq. ([23]) to brfD^p (q, k — q); however, the simple relation between and S (Eq. [44)) means that this change is 
equivalent to redefining b s 2 and bg2. 
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0.05 0.1 0.15 0.2 

k [h/Mpc] 

FIG. 3: Bias terms in Eq. (|23[) . for the galaxy-mass cross-power spectrum, at z = 1. The black (solid) line shows the term 
proportional to 6,52, red (dashed) shows b s 2, and green (dotted) shows 63. The coefficient values are chosen to match those in 
the more important galaxy-galaxy power spectrum shown in Fig. [3] 

2. Galaxy-Galaxy power spectrum 

We now compute the cross-power spectrum between two types of galaxies, each with a set of bias parameters 
represented by the letters a and b. The power spectrum of a single type of galaxy is of course obtained by taking 
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equal bias parameters for each type. 



P 



ab (k) = asbs (PNL(fc) 
d 3 q 



0,3 + b 3 



4(k) p{k) 



I. f d 3 q 



P (q) P (|k - q|) F| 2) (q, k - q) ~a S 2 +b S 2 + [ ~a s 2 +b s2 )S (q, k q 



(27T) 



P(q)P(\k- q|) 0526,52 + (a s 2fe 5 2 + ^2^2) 5 (q,k- q) +a s 26 s 2 5 (q, k - q) 



1 + ^a 5 2 e + b S 2^j ^- + ^a s 2 e + b s 2^j ^- + i ^a £ 3(T e 2 a 2 + b^a 2 tb ^j + a 5e & 5e cr 2 + a e 2& £ 2^p 



(24) 



The first two lines in Eq. (I24[) are the terms proportional to the linear bias factor of one type of galaxy or the other, 
and are thus essentially just the P mg result re- written (including already all of the same renormalizations). The third 
line contains the new terms due to cross-products of the 2nd order bias factors. The last line contains cross-terms 
involving the random variables e a and e bl which we have taken to be possibly locally correlated with cross-power 
spectrum P^ b , and cross-variance a 2 ab = (e a e b ). 

In the k — > limit the new terms in the third line of Eq. (|24|) are not zero, but are fc-independent, i.e., they look 
like locally correlated white noise: 



— l -^P(q)P(\k - q|) a S 2b S 2 + (a s 2b S 2 + 052^2] S(q L ,k-q) + a^b^S (q, k - q) 2 



(27T) 



2\ as2+ r s2 



2~ 

b&2 + -b B 



d 3 q „, s 2 

(i^ P(,) 



(25) 



It is interesting to note that these shot-noise-like terms in the power spectrum come from the same terms in the 
original galaxy density Taylor series which produced a non-zero contribution to the mean density. This is consistent 
with our expectation that white noise must be associated with non-conservation of the field. As in [59| . we can absorb 
these constant terms into the observable noise matrix, but first we need to discuss the e-related terms. 

We define the lowest order e-rclated term in the last line of Eq. (|24|) to be No a b = a e b e P^ b . If we were only calculating 
to lowest order, this would be the usual galaxy shot-noise. The rest of the terms are also constants (k- independent), 
so they can be simply interpreted as renormalizing this noise matrix, i.e., in spite of the apparent large number of 
new terms, there is actually nothing new here at all. After renormalization, the result is a completely general effective 
noise matrix for the galaxies, i.e., some choice of the bias parameters can produce any mathematically legitimate 
matrix. Altogether, the formal redefinition is: 



N ab = N 0ab 



1 ( "a- + '''<>-. ) y + (a s 2 e + 6 S 2 £ ) y + i (a e 3<7 2 a 2 + 6 £ 3(7 2 



d 3 q p , ^2 



b 2 + aseOseCr + a e 2 e 2 



'tab 



(26) 



The result that we should have a general free noise matrix is insensitive to assumptions about the form of the matrix 
P^ b - we could start by assuming that e a and e b arc perfectly correlated (i.e., there is really only one random variable), 
or perfectly independent, and in either case the renormalizations would generate the extra freedom. We do require 
some intrinsic randomness, i.e., we cannot start with P^ b = and rely entirely on the noise matrix generated by 
the density fluctuations (if we want to allow for different types of galaxies to be uncorrelated, or correlated in a way 
different from that given by the right-hand side of Eq. I25p . This is somewhat unsatisfactory as the randomness in 
the initial density field must ultimately be the source of randomness in the outcome - we speculate that higher order 
density field terms will produce a general noise matrix, so that eventually there will be no need to give e a seed 
variance. Note that one should not think too hard about where a noise matrix that is nearly diagonal with elements 
equal to the inverse mean number density of galaxies (n^ 1 ) comes from in this picture (aside from observing that 
it is possible). The terms that appear on the right hand side of Eq. (f2"6"]l do not need to add up to the observable 
noise in any literal sense, because the observable noise will contain other, possibly even larger, terms at higher order. 
Eq. (|26[) j us t shows why it is legitimate to drop the undesirable terms (that are non-zero as k — > 0, including all 
of the e-related terms) in the PT calculation, i.e., because they are redundant with a free noise matrix. One should 
remember that the idea of Poisson sampling, i.e., the 1 model for noise power, was never more than an apparently 
quite accurate guess - [77]], for example, found deviations for dark matter halos. 
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We are left with the final power spectrum: 



P a b{k) = agbs \PNh(k) 
d 3 q 



5,3 + b 3 



4(k) P(k) 



P(q)P(\k-q\) (q, k - q) ~a S 2 + b 5 2 + { ~a s 2 + b s 2 ) S (q, k q) 



(27) 



I f d 3 q 



2 7 (2tt) 
+ a s 2b s 
+ N ab . 



a S 2b S 2 [P(|k-q|) - P (q)} + (a s 2b S 2 + a S 2b s 2 



5(q,k-q) P(|k-q|)--P(g) 



S(q,k-q) 2 P(|k-q|)-lp( ? ) 



This equation is not as complicated as it may look, including only a few simple building blocks: P (g), P (|k — q|), 
pf (q,k - q), S (q, k - q), and I R (r) (in a\ (k)). 

Figure 0] shows examples of the auto-power spectrum for a single type of galaxy. We see that the effects of each 
term are somewhat different. The 63 term has a greater influence at larger relative to smaller scales than the 652 term. 
Those two terms can have either sign, but the b s 2 term is essentially always negative. Note that the power spectrum is 
not linear in the bias parameters, so the outcome when all of the parameters are varied is more complex than a simple 
sum of the examples we show. The increase due to the bg2 term actually reaches a maximum (for k = 0.2 h Mpc -1 ) 
at bg2 ~ 0.6, before declining again as the negative quadratic part comes to dominate (this transition is apparent as 
the flattening at the high k end in the figure). 



3. Bispectrum 

The bispectrum is the three point correlation function [l22l Il23l 1 124L Il25l . Il26| in Fourier space. It vanishes if the 
density fluctuations are Gaussian. The bispectrum can be used to measure non-Gaussianity in the primordial density 
distr i butio n, if any, and non-Gaussianity induced by non-linear gravitational evolution and bias B EM EM EM Elf. 
[l3l[ 1 132t | show that the bispectrum is a very powerful addition to the power spectrum for general cosmological 
parameter constraints, especially on the primordial power spectrum amplitude and slope. In this section we show the 
form of the galaxy bispectrum in our generalized bias model. Only 2nd order terms in the density perturbations are 
needed to construct the bispectrum to 4th order. By definition bispectrum takes the following form 

(5 (kO 5 (k 2 ) 6 (k 3 )) = (2n) 3 5 D (k, + k 2 + kg) B (* X) k 2 , h) , (28) 

where S D (ki + k 2 + k 3 ) means that only closed triangular configurations are non-zero. In our calculations, we assume 
that the primordial density fluctuations did not have any signature of non-Gaussianity. The galaxy bispectrum is 
then 

B g (k 1 ,k 2 , k 3 ) = b\P (h)P (k 2 ) [2 P| 2) (ki, k 2 ) + b S 2 + b^S (ki,k 2 )] + 2 b Se N b\P (k x ) + K2N 2 

+cyclic permutations of k\, k 2 , k% , (29) 

where we note that the angle between any two of the k vectors is determined by the length of the third. We have 
defined bs e = and b e 2 = and N is the noise power. Here we see directly the convergence between Eulerian 

and Lagrangian bias that we were hoping for - the new s 2 term int rodu ces the extra configuration dependence in the 
bispectrum found for Lagrangian bias by [M EM Ell • Note that jl33| actually compare Lagrangian vs. traditional 
(density-only) Eulerian bias in fits to the PSCz bispectrum, but did not have enough statistical power to distinguish 
them (Eulerian bias was slightly preferred). 

We see now the purpose in the introduction of the full structure of e-related terms. These terms have produced 
exactly the structure needed to correctly represent Poisson noise in the bispectrum. If the galaxies were a Poisson 
sampling of the underlying biased density field, we would have bs e = \ and b e 2 — i [MEM]- Even the appearance 
of the extra new free parameters, bs e and b e 2 , is necessary, as [M] showed that galaxies in simple halo models do not 
obey Poisson sampling exactly, but instead follow the more general form we find here, with the values of bs e and b e 2 
depending on the details of the model (in fact, our introduction of this treatment of noise was entirely motivated by 
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A reduced bispectrum, which does not depend on the mass power spectrum amplitude, is often written as 

n (h , t, \ B x (ki,k 2 ,k 3 ) 

[Kl, «3, fc 3 j - px (fci) px {k2) + px {k2) px ^ + px (fcs) px {ki) ■ [M) 

The reduced galaxy bispectrum, to leading order, is then, 

P (fci) P (k 2 ) S (ki, k 2 ) + cyclic perms. 



Q g (ki,k 2 ,k 3 ) = b 



-l 

5 



(fci,fc2,/c 3 ) + b S 2 +b s 2 



P (fci) P (k 2 ) + P (k 2 ) P (k 3 ) + P (fe 3 ) P (ki) 



(31) 



where Q m is the reduced bispectrum of the mass density perturbations. The noise terms, which we have dropped 
from this presentation of Q g , undermine the elegance of using Q g . We suspect that it will be more straightforward to 
interpret noisy observations using a simultaneous fit to P g and B g , rather than going through Q g . 

Figure [5] shows some examples of the reduced bispectrum and bias terms. Q g has been discussed as a means to 
measure b$, because, unlike P g , it is only sensitive to bs, not to the amplitude of the mass power spectrum. It has 
always been necessary to marginalize over 6,52 in this approach jl33l . 1 1341] . and now we have an extra possibility, 

(2) 

degeneracy with b s 2. It is still possible to measure all the parameters independently, because ' and S differ by 
more than an additive constant; however, it would be helpful if a plausible upper limit on b s 2 could be determined 
using simulations. 

Other higher order stati stics, like Fourier phase statistics [l35j | , the trispectrum jl3l| , or the probability distribution 
function of counts in cells [l36j . could also be considered. 

III. MISCELLANEOUS FURTHER EXTENSIONS 

In this section we discuss a few further extensions of the baseline approach to bias outlined in the previous section. 
In ^III Al we discuss additional short-range non-locality in the bias relation. In mil Bl we discuss briefly the new 
considerations that arise when one goes to redshift space. Finally, in mil Cl we discuss non-Gaussian initial conditions. 

A. Short-range non-locality 

So far, our model has included non-local dependence of the galaxy density on the mass density, but only in the form 
of local dependence on V 'jV j<f> and ViVj, which are in turn determined by the density field through gravitational evo- 
lution. For completeness, we now consider relatively short-range non-locality that might be caused by hydrodynamics 
or the highly non-linear details of galaxy formation, i.e., 

5 g (x)=f[6(x')} , (32) 

where the galaxy density at x depends on the mass density at all points x' roughly obeying |x — x'| < R. We assume 
that R is small in the sense that k 2 R 2 << 1, where k is the observed wavenumber. First, we expand S g as a Taylor 
series in 5, i.e., 

S g (x) = / [6 (x')] = / [0] + J dx' K (|x - x'|) 8 (x') + ... , (33) 

where K (|x — x'|) is the kernel of derivatives of galaxy density at x with respect to mass density at x'. We allow an 
almost arbitrary form for K, except that it must fall to zero outside a typical scale R, and it must be isotropic. We 
now shift the integration variable to Ax = x — x', and Taylor expand in Ax, i.e., taking only the linear term, 



d5 . . . 1 d 2 S 
5 (x) + — (x) Ax, + - (x) AxiAxj 



5 g (x) = J dAx K(\Ax\)6(x + Ax) = J dAx K (|Ax 

= S(x) [dAxK (|Ax|) + ^W f dAx K(\Ax\) Ax, + I HM [ dAx K (|Ax|) Ax.Ax, + .. 
J axi J 2 dxidxj J 



(34) 



(this derivation was inspired by [1371 ]). The simple integral over K in the first term is naturally defined to be the 
standard linear bias, bs- The 2nd term, integrating K Axi, must be zero by the symmetry of the kernel. The third 
term, integrating K AxiAxj must be zero by symmetry if i =/= j, but if i — j, the integral for a generic kernel will 
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give a result of order R 2 times the simple integral over the kernel in the first term, i.e., the integral will give a result 
of order ~ bxR 2xK 



*5tj . Therefore, we have for the galaxy density 



<5(x) 



-i? 2 V 2 <5(x) 



(35) 



where 6r is of order unity (e.g., if the kernel was a Gaussian with rms width R, bp would be exactly 1). The loophole 
in the argument that bp ~ 1 is if the kernel has substantial positive and negative parts, and these are tuned to almost 
perfectly cancel in the average over the whole kernel, making the average much smaller than the fluctuations. On the 
other hand, if the kernel does not have significant negative parts, one could even argue that bp should be not just 
O (1), but also positive. In Fourier space we have 



and the power spectrum is 



b R 



R 2 k 2 



1 - b R R 2 k 2 



P{k) 



(36) 



(37) 



Fits including bn can include a prior that bji is not much greater than one, and possibly also positive, although 
this will only be useful if consideration of galaxy formation physics can place an upper limit on R. Note that, if 
this program for modeling non-locality is to succeed, k 2 R 2 becomes a second small parameter, in addition to the 
fluctuation amplitude, so it is not necessarily necessary to include terms simultaneously higher order in both k 2 R 2 
and S. 

The reader may be tempted at this point to conclude that all we have found is that short-range non-locality can 
be modeled by assuming the galaxy density simply depends on a smoothed version of the density field - expanding 
the smoothing kernel would generally produce the same k 2 R 2 term. The truth is not quite so simple. If we follow 
the same procedure on the next, O (^ 2 ), term that would appear in Eq. (|33|) . we find not just the new term that 
would come from using the square of a smoothed field, R 2 S V 2 J, but also a term R 2 (V<5) • (V<5), with the two terms 
generally multiplied by independent bias parameters. Together, these two terms are equivalent to assuming that the 
galaxy density depends on both the square of the smoothed density field and, independently, a smoothed version of 
the square of the un-smoothed density field. Generally, the correct procedure for representing short-range non-locality 
appears to be to write down all possible scalar higher derivative terms, each with its own bias parameter, and a factor 
of R for every derivative. 

Similar arguments can be made for the noise. If it is correlated on scale R, smaller than the scale of observation, 
one generically expects the noise power spectrum to look like 



P N (k) = 1 - N R R 2 k 2 



N ■ 



(38) 



where N is the usual large scale white noise, and Nr is of order unity for generic noise correlation functions. 



B. Redshift Space 



Allowing for rcdshift-space distortions changes the symmetry considerations that we used to decide which variables 
galaxy clustering could depend on. The radial direction can now be special. For example, ViiWii and ViiVii^) are 
now allowed in the Taylor series, where || indicates the radial direction. The non-locality kernel in mil Al can depend 
separately on the radial coordinate as well, which will lead to an R 2 k 2 term. All of these terms generally come with an 
unknown bias parameter. Non e of these considerations are needed in the usual approach to redshift-space distortions 
pioneered for galaxies by [l38l | , because the transformation from real to redshift space is applied to the alre ady biased 
field, and does not involve any new unknown functions. The Lya forest represents a counter-example |l39l Il40j |. 
where the already redshift-distorted optical depth field, r, undergoes the local non-linear transformation cxp (— r) 
to produce the observed transmitted flux fraction field. While the form of this transformation is completely known, 
it applies to the un-smoothed optical depth field, which is sufficiently non-linear that one cannot hope to use the 
in-this-case-actually-computable Taylor series coefficients to describe very large scale clustering - the observable bias 
parameters will inevitably receive perturbatively un-computable contributions from higher order terms. Consistent 
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with this picture, jl4dl | showed that the standard [l38j form for the large scale redshift-space power spectrum fit the 
Lya forest power spectrum well, as long as the distortion parameter (3 was a free parameter, rather than the usual 
[3 = (d\nD/dlna) b~ x . This is equivalent to introducing a Vy«|| term with a fre e bias parameter. The cautious reader 
may wonder whether the non-line ar tr ansformation involved in the usual [l38j | redshift-space distortion c alcul ation, 
when taken to higher order as in 1041 ] . may lead to the same problem of renormalization of the standard 138] form 
of large-scale power, in a way that might look like velocity bias, for example. We leave this question, and further 
consideration of redshift-space distortions in the renormalized bias approach for future work. 



C. Primordial non-Gaussianity 



When considering the model for non-Gaussian initial conditions where 4>(x) = C ( x ) + /n i X 2 fx ), where £ is a 
Gaussian variable with the primordial power spectrum that we usually associate with </>, [14lL Il42j found the need 
for a bias term directly proportional to C, which looks for practical purposes like a direct dependence on cb (see also 
143]). This may seem inconsistent with the considerations of this paper, where we excluded any direct dependence 
on <f>. The explanation for this is that £ does not obey the principle that led us to exclude dependence on <f> - a 
homogeneous change in £ is observable, essentially as a change in the primordial power spectrum am plitu de, which 
of course affects galaxy formation and clustering. This answers the question that was unanswered in [142]: whether 
the term should be considered to be a bias parameter multiplying Q or <f> - it makes no difference at lowest order, but 
if a higher order calculation is needed, the answer clearly is that the dependence should be on 



IV. CONCLUSIONS 



The central result of this paper is Eq. (|27|) . which shows the most general galaxy power spectrum that can be 
derived starting from expanding the galaxy density as a Taylor series in the local values of S, diVj, didj(j). This power 
spectrum depends on only two new parameters, beyond the usual linear bias, shot-noise, and 2nd order density bias. 
One of the parameters quantifies 2nd order dependence on the magnitude of the tidal tensor (or, equivalently after 
reparameterizations, the difference between velocity divergence and density), and the other parameter multiplies a 
set of 3rd order terms that collectively appear as a fc-dependent bias proportional to the linear variance on scale k. 
Eq. (f2"T)) allows for cross-correlating different types of galaxies, each with its own set of bias parameters, but the 
power spectrum of a single type of galaxy can be obtained from it by simply setting the bias parameters for the two 
types equal to each other. We also give the the cross-spectrum between mass and galaxies explicitly, in Eq. (|23|) 
(this can of course be obtained from Eq. [57| by setting the linear bias to 1 and all of the other bias parameters 
to zero for one type of galaxy). In Eq. (|3ip we give the bispectrum of galaxies in this model, which includes new 
dependence on the 2nd order tidal tensor term. Eq. (|3ip also shows how including a Gaussian white noise variable 
e as an expansion variable in the original Taylor series for galaxy density allows for reproduction of the non-trivial 
appearance of Poisson-sampling noise in the bispectrum, or more general non-trivial noise properties. In £ )III Al we 
explain how short-range non-locality (from hydrodynamics or highly non-linear galaxy formation) can be modeled as 
a derivative expansion. 

Since no symmetry prevents it, the galaxy density should have at least some small dependence on these new 
terms - the question is just how much. It might have been easy to miss this dependence in past studies [56j], as the k 
dependence is not enormously different from the density-only model, and appears in a range of scales where deviations 
from the density-only model could be interpreted as even higher order effects, or confused with shot-noise. The new 
terms may not all be necessary, even at the level of future precision data, but this should be demonstrated, not simply 
assumed, i.e., it would be good if they were all considered and bounded. To distinguish the terms in simulations, it 
will be useful to look at the mass-galaxy power spectrum, the galaxy-galaxy power spectrum, and the bispectrum 
simultaneously (ideally even the bispectra mixing mass and galaxies, which will be simple to write down, and higher 
order statistics). 

While one can freely marginalize over the parameters of the extended model when interpreting future high precision 
clustering measurements, one can also think of this general model as a framework for interpreting numerical simulations 
or other specific models of galaxy/halo formation. For a long time, the linear bias parameter has been a useful way to 
condense simulation predictions for very large-scale clustering into one number per type of object, rather than simply 
reporting results for free functions P g {k). The parameters of the perturbative model should similarly be a useful 
way to condense perturbative-scale clustering down to a small, well-motivated, set of numbers, rather than discussing 
scale dependent bias as a free function b(k) (or parameterizing it in arbitrary ways (26|). In the most optimistic 
case, both the halo-based approach and the PT approach will work very well, and be complementary in that the PT 
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approach will provide a clear set of large-scale parameters to be calibrated by the halo-based approach that includes 
smaller-scale information. 

Some other questions for followup work include: 

• Are there other terms that we should include? 

• The equivalences that led to the need for only a single bias parameter at 3rd order should be investigated 
further. It seems likely that there are relations like 772 = %s\ — -^Sf which we have not taken into account. 
Note that some of the parameters that are unnecessary in the present calculations may become necessary when 
calculations are done to higher order. 

• Redshift-space distortions, touched on in §111 B( should be computed explicitly within the renormalized bias 
model. 

• While this property has not been exploited very well in the past, the scale where PT breaks down should be 
internally determinable. If calculations are pushed to at least one higher order, the breakdown scale should 
be evident as the place where the difference between the two highest orders calculated starts to matter. In 
the past, PT has acquired a reputation for limited accuracy because this kind of testing has not been done, 
while the calculations were pushed beyond the point where there was good reason to expect them to work well. 
In the future, very high precision, world, the primary concern for PT should not be simple breakdown of the 
perturbative expansion, but instead insufficiently general modeling, e.g., missing terms like the ones in this 
paper. High precision goodness-of-fit tests should also help establish reliability. 

• The connection between t his approach to bias and renormalization group /resu mmation approaches to the non- 
linear mass clustering [MEM EM EM EM EM EM EM EM EMdEM could be considered. [M] showed 
that our approach to bias works well when compared to simulations as long as PT describes the mass power 
spectrum well |l55l | , but it isn't clear what one should do when standard PT no longer describes the mass power 
well, but more sophisticated methods do. 

• Time evo lution of bias can be considered from the point of view of this paper [M EM EM, EM, EM EM EM 

EM Eli. 

• One clear loophole in all of th ese arguments exists if long-range effects of radiation sources affect galaxy cluster- 
ing, e.g., through reionization 

[HI Ell EH (long enough range to make k 2 R 2 not a good expansion parameter). 
If these effects are small, some perturbative method can probably be used, but it would have to be something 
outside the scope of this paper. 

• Eventually , one may want to correl ate prope rties of ga laxies other than de nsity, e.g., ellipticity, galaxy orienta- 
tion, etc. (IM EM EM EM EM ESI, EM EM EM EM, EM EM, EM EM]- These correlations should be 
describable by a similar approach to the one here, except with modified symmetry considerations. For example, 
a traceless tensor observable can be linearly related to Sy (x) by a scalar bias parameter, but not to <5(x). 

Finally, the background motivation for this work deserves re-emphasis: Fig. [T] shows that future redshift surveys 
will contain orders of magnitude more information than present surveys. Fig. U shows that there will be a wide range 
of scales (e.g., very roughly a factor of 4 in k or 64 in number of modes), in which corrections to linear theory will 
be necessary but still fractionally small, i.e., amenable to a perturbative treatment, for realistic planned surveys. To 
exploit this information optimally will require rigorous modeling of clustering, far beyond what has been done in the 
past. 

We thank Roman Scoccimarro for suggesting that we consider dependence on VjVj0 in addition to 9, and Adam 
Lidz, Neal Dalai, and Latham Boyle for helpful discussions. PM acknowledges support of the Beatrice D. Tremaine 
Fellowship. 



V. APPENDIX: PT BASICS 



Standard gravitational PT is well-described in, e.g., [60j|. Here we list some of the relevant facts that we use. 
The density perturbations are given by 

6 (k) = 5t (k) + / -^U (q) 5 1 (k - q) Ff (q, k - q) (39) 
J (2n) 

+ f TT^TTT^^ 1 toi)* 5 ! (<l2)£i (k - qi - q 2 ) F| 3) (q x , q 2 , k - qi - q 2 ) 
J (27TJ (27r) 
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with 



and 



7 2 fcifc 2 



2 /ki k 2 



7 V kik 2 



(40) 



^ (3) (qi,q 2 ,q 3 ) = ^ [2/3( qi ,q 2 +q 3 ) G (2) (q 2 , q 3 ) + 7 a (qi, q 2 + q 3 ) F< 2 > (qj, Cfc) 
+ [2 /3( qi +q 2 ,q 3 ) +7a(qi+q 2 ,q 3 )] G (2) (qi,q 2 ) 



(41) 



Note that this F^ 1 is un-symmetrized, while Eq. I|40p requires a symmetrized version (which we always indicate with 
a subscript 5"). To symmetrize, average over all possible positionings ol the arguments. See below for definitions of 
the component functions. 

The velocity divergence 9 is given by a similar expansion with the kernels F( N ) replaced by the following kernels 



„(2),, , ^ 3 1 ki • k 2 f ki , fc 2 \ , 4^ki-k 2 
C?y(ki,k 2 ) = - + - 



7 2 fcifc 2 



7 V feife 



and 



G (3) (qi,q 2 ,q 3 ) = — [6 j3 (qi, q 2 + q 3 ) G (2) (q 2 , q 3 ) + 3 a (qi, q 2 + q 3 ) F^(q 2 ,q 3 ) 
+ [6 /3(qi + q 2 ,q 3 ) + 3 a (qi + q 2 , q 3 )] G (2) (qi, q 2 ) 

Again, note that this is un-symmetrized. 

To represent the difference between 6 and S, we define — — F^ N \ 



(42) 



(43) 



D^(k x ,k 2 ) = - 



kl k; 

k\k 2 



S(ki,k2)-- 



where we have defined S to represent Fourier space products of the operator 7^ , 



7y (q) Iji ( k ) = S (q, k) 



(q-k)- 

k 2 q 2 



Note that 



y djti S (k, q) = 



where fi — k • q/kq. 

The un-symmetrized 2nd order kernels (appearing in the 3rd order kernels) are 



F {2) (qi, q 2 ) = ~ [5 a ( Ql , q 2 ) + 2 /? (qi, q 2 )] 



and 



where, finally 



G (2) (qi, q 2 ) = y [3 a (qj, qa) + 4 j9 (qj, qa)] , 



a (q, k) 



(q + k) • q 



(44) 

(45) 
(46) 

(47) 
(48) 

(49) 



and 



/3(q,k) = 



|q + k| 2 q-k 

2q 2 k 2 



(50) 
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FIG. 4: Effect of various kinds of bias on the auto-power spectrum of a single type of galaxy (Eq. I27|l . at z = 1. The 
black (solid) line shows the term proportional to b S 2, red (dashed) shows b s 2, and green (dotted) shows 63, with values of 
the coefficients labeling the curves (all of the other coefficients are zero in each case). The blue (long-dashed) line shows the 
effect of N (white noise), when similarly normalized by the mass power spectrum. The coefficient values are largely arbitrary, 
i.e., the lines are only intended to show the shape of the effect, not to imply anything about the magnitude. The error bars 
show approximate fractional errors on band power measurements from a 100 cubic Gpc/h survey (e.g., ~ 3/4 of the sky at 
Kz<2). 
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FIG. 5: Quantities contributing to the reduced bispectrum, Q g (Eq. I31[) . as a function of /ii2 = ki ■ k2/fcifc2- The left 
panel shows ki = 0.1/iMpc -1 , fa = 0.2 ftMpc" 1 , while the right shows ki = fe = 0.1/iMpc _1 (in this case, recall that 
k3 = — (ki + k2), so ka = when /ii2 = —1). The blue, dot-dashed, curve shows the mass bispectrum Q m , the black, solid, 
lines represent b S 2 , and the re d, dashed , curves are the new b s 2 term (new to Eulerian bias models, although already present 
in Lagrangian bias models [60l . Ill8l . Ill9| ]). The parameter values were chosen to match the power spectrum figures. 



